A Chinese-English Organization Name Translation System Using Heuristic Web Mining and Asymmetric Alignment

نویسندگان

  • Fan Yang
  • Jun Zhao
  • Kang Liu
چکیده

In this paper, we propose a novel system for translating organization names from Chinese to English with the assistance of web resources. Firstly, we adopt a chunkingbased segmentation method to improve the segmentation of Chinese organization names which is plagued by the OOV problem. Then a heuristic query construction method is employed to construct an efficient query which can be used to search the bilingual Web pages containing translation equivalents. Finally, we align the Chinese organization name with English sentences using the asymmetric alignment method to find the best English fragment as the translation equivalent. The experimental results show that the proposed method outperforms the baseline statistical machine translation system by 30.42%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Practical Chinese-English ON Translation Method Based on ON's Distribution Characteristics on the Web

In this paper, we present a demo that translate Chinese-English organization name based on the input organization name’s distribution characteristics on the web. Specifically, we first experimentally validate two assumptions that are often used in organization name translation using web resources. From experimental results, we find out several distribution characteristics of Chinese organizatio...

متن کامل

A Practical Chinese - English Organization Name Translation Method Based on Web Assistant ⋆

In those traditional organization name translation methods, researchers usually assumed that for every organization name to be translated, its correct translation would exist somewhere on the web. And some researchers further assumed that both the organization names to be translated and their correct translations would exist somewhere on some mix-language web pages. Thus these researchers think...

متن کامل

Engkoo: Mining the Web for Language Learning

This paper presents Engkoo 1, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an a...

متن کامل

Chinese-English Organization Name Translation Based on Correlative Expansion

This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative named entity list. Secondly, three kinds of expansion methods are used to generate some expanded que...

متن کامل

Exploiting the Web as Parallel Corpora for Cross- Language Information Retrieval

The expansion of the Web creates more requirements for Cross-Language Information Retrieval (CLIR). Query translation is the key problem. Previous studies have shown that query translation can be done by exploiting a large set of parallel texts. However, the problem arisen is the unavailability of large parallel corpora for many languages. In this paper, we describe a mining system that automat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009